Multimedia inclusion and referencing issues
These issues have been discussed in the following topics:
message/view/GOALS/29930043
message/view/GOALS/30141635
Please don’t write more about this issue there, use the Discussion tab for this page
The following is just a first attempt to capture some of the issues.
1. Introduction
The goals say that the BetterGEDCOM (BG) should include a standard container specification to accommodate ancillary Multimedia resources. With Multimedia we mean digital resources that may represent photos, scanned images, video, sound, documents, web pages, diagrams, maps, (database,?) etc. Importantly, we need the ability to incorporate resources in formats that are as yet unknown to us (such as emerging audio codecs). Some resources (e.g., video) may demand high performance, which means we must not introduce significant overhead for media objects.
A resource may reside in a file or in information available via an internal computer interface or via a data network.
1.1 Other wiki references:
- a. A separate wiki page, Container & File Issues, was created by jbbeni, with contributions by Russ and Greg; see also the discussion attached to that page, "Zip?"
2. User requirements
2.1 A solution should allow
a. internal references: transfer of multimedia files together with the genealogy data
b. external references: referencing to multimedia resources not transferred together with the genealogy data. These resources may be stored on some media or computer, or may be accessible via a data network
c. many-to-one references: resources may be referenced by many types of entities in the genealogy data, e.g. a source, person, place, excerpt, note? etc. (to be discussed). Multiple entities should be able to “share” (point to) a common reference structure when the object is the same, possibly part of a “media-s” structure within the genealogy data file, that supports organization/management/grouping of the media references.
<<We may need a separate page for transfer of multi media mangement information in the genealogy data file>>
d. Uniform Resource Identifiers: identification of the resource (within one or more contexts)
e. ??grouping of media objects that should be access at the same time, e.g. photo and sound – cf. Gedcom 5.5.1
f. Resource descriptive information: transfer of information about the object (object type (MIME?), origin/creator/author/publisher, (file) size, title, description, caption, creation time, identification of e.g. persons shown, type of “objects” shown in media (e.g. persons, landscapes, houses), copyright, informal/short identifier/name, setting (type of circumstances/event when created), user defined attributes and attribute types/flags, quality classification, creating program name&version, tags, research notes, duration – and more – or less.
Discussion: Some of this information may duplicate other info in the genealogical data or in the object. This may or may not be desirable in some cases.
2.2 A solution could also allow
g. reference to one or more specific parts of the resource in a type-specific way
h. the receiver is able to check the integrity and authenticity of the object
i. resources could be access controlled
j. transfer of multimedia in a program-to-program exchange of genealogy data (this may be outside the scope, but the solution could be influenced by such future capabilities)
3. Technical solutions
3.1 Internal files
These are files transferred together with the genealogy data file, in a Container file. (Multimedia objects are not stored in XML in the genealogy data file due to efficiency considerations.)
a. The container files type should be standardized, widely available, application independent, license free, platform independent, durable, possibly compressing (and optionally secure).
b. It should allow a hierarchy of files
c. All implementations must support one default Container file type
d. The Container type could be assigned a file extension, e.g. “.gedsomething” (.gedz?)
e. The genealogical data file may be transferred outside of a container if there are no multimedia objects to be transferred (as a .ged file today).
f. The container must support long file names, using a defined set of charaters, be able to handle the / \ problem, and file attributes.
Discussion: One possible Container file type is zip (possibly with an internal packaging structure). The Open Packaging Conventions (OPC) has been mentioned in this context, and there are most likely others ((OOXML (Open Office XML, aka Open XML), ODF (OpenDocument Format), OpenDocument)). Also "ISO-Image" (used for images of CD/DVDs - ISO-9660 based or UDF? based)
Expertise needed! Some existing standards may be too complex, but maybe it is possible to specify restrictions on their use. Some of them may contain other useful functionality not mentioned on this page.
3.2 External references
<<Needs much more work and discussion>>
Needs to identify the location of the object (e.g. an URL/URI), access method if any (e.g. HTTP), possibly transfer method (eg. e-mail), transfer time, - or may indicate that the identifier of the object should be used to obtain this info, if stored on the receivers system. E.g. if the submitter has previously received the media from the receiver, and thus do not need return them.
3.3 Considerations independent of “internal” and “external”
The structure in the genealogy data file that references/describes the multimedia objects should in principle be the same for internal and external objects, but some info will be different.
Both the internal and external methods may be used in the same genealogic data file.
4. Backward compatibility issues
None? What about Gedcom 5.5.1?
5. Grouping of functionality/support levels
<<Something to think about later>>
Consider this: Many software applications burn CDs for data transport, but this data can't then be imported or exported (typically). However, an "ios image" which is a format used to store images of CDs and DVDs is an efficient compressed file that has its own filesystem independent of any one operating system. (File systems used for iso images are either ISO-9660 based or UDF based, I think.)
The issue with file formats is really one about what is easier for developers to adopt. I don't think users care much, as long as it doesn't mess with the data. Obviously naming conventions would be an issue, as well as any filesystem structure or conflicts between names if any filesystem structure changes occurred.
I wonder if using an iso image format is a good option. Anyway, I look forward to see what those who know a lot more about this than I do have to say.